Goto

Collaborating Authors

 plan step


Self-Anchor: Large Language Model Reasoning via Step-by-step Attention Alignment

Zhang, Hongxiang, Tian, Yuan, Zhang, Tianyi

arXiv.org Artificial Intelligence

To solve complex reasoning tasks for Large Language Models (LLMs), prompting-based methods offer a lightweight alternative to fine-tuning and reinforcement learning. However, as reasoning chains extend, critical intermediate steps and the original prompt will be buried in the context, receiving insufficient attention and leading to errors. In this paper, we propose Self-Anchor, a novel pipeline that leverages the inherent structure of reasoning to steer LLM attention. Self-Anchor decomposes reasoning trajectories into structured plans and automatically aligns the model's attention to the most relevant inference steps, allowing the model to maintain focus throughout generation. Our experiment shows that Self-Anchor outperforms SOTA prompting methods across six benchmarks. Notably, Self-Anchor significantly reduces the performance gap between ``non-reasoning'' models and specialized reasoning models, with the potential to enable most LLMs to tackle complex reasoning tasks without retraining.


Conversational Planning for Personal Plans

Christakopoulou, Konstantina, Qu, Iris, Canny, John, Goodridge, Andrew, Adams, Cj, Chen, Minmin, Matarić, Maja

arXiv.org Artificial Intelligence

The language generation and reasoning capabilities of large language models (LLMs) have enabled conversational systems with impressive performance in a variety of tasks, from code generation, to composing essays, to passing STEM and legal exams, to a new paradigm for knowledge search. Besides those short-term use applications, LLMs are increasingly used to help with real-life goals or tasks that take a long time to complete, involving multiple sessions across days, weeks, months, or even years. Thus to enable conversational systems for long term interactions and tasks, we need language-based agents that can plan for long horizons. Traditionally, such capabilities were addressed by reinforcement learning agents with hierarchical planning capabilities. In this work, we explore a novel architecture where the LLM acts as the meta-controller deciding the agent's next macro-action, and tool use augmented LLM-based option policies execute the selected macro-action. We instantiate this framework for a specific set of macro-actions enabling adaptive planning for users' personal plans through conversation and follow-up questions collecting user feedback. We show how this paradigm can be applicable in scenarios ranging from tutoring for academic and non-academic tasks to conversational coaching for personal health plans.


Cocoa: Co-Planning and Co-Execution with AI Agents

Feng, K. J. Kevin, Pu, Kevin, Latzke, Matt, August, Tal, Siangliulue, Pao, Bragg, Jonathan, Weld, Daniel S., Zhang, Amy X., Chang, Joseph Chee

arXiv.org Artificial Intelligence

We present Cocoa, a system that implements a novel interaction design pattern -- interactive plans -- for users to collaborate with an AI agent on complex, multi-step tasks in a document editor. Cocoa harmonizes human and AI efforts and enables flexible delegation of agency through two actions: Co-planning (where users collaboratively compose a plan of action with the agent) and Co-execution (where users collaboratively execute plan steps with the agent). Using scientific research as a sample domain, we motivate the design of Cocoa through a formative study with 9 researchers while also drawing inspiration from the design of computational notebooks. We evaluate Cocoa through a user study with 16 researchers and find that when compared to a strong chat baseline, Cocoa improved agent steerability without sacrificing ease of use. A deeper investigation of the general utility of both systems uncovered insights into usage contexts where interactive plans may be more appropriate than chat, and vice versa. Our work surfaces numerous practical implications and paves new paths for interactive interfaces that foster more effective collaboration between humans and agentic AI systems.


Show and Guide: Instructional-Plan Grounded Vision and Language Model

Glória-Silva, Diogo, Semedo, David, Magalhães, João

arXiv.org Artificial Intelligence

Guiding users through complex procedural plans is an inherently multimodal task in which having visually illustrated plan steps is crucial to deliver an effective plan guidance. However, existing works on plan-following language models (LMs) often are not capable of multimodal input and output. In this work, we present MM-PlanLLM, the first multimodal LLM designed to assist users in executing instructional tasks by leveraging both textual plans and visual information. Specifically, we bring cross-modality through two key tasks: Conversational Video Moment Retrieval, where the model retrieves relevant step-video segments based on user queries, and Visually-Informed Step Generation, where the model generates the next step in a plan, conditioned on an image of the user's current progress. MM-PlanLLM is trained using a novel multitask-multistage approach, designed to gradually expose the model to multimodal instructional-plans semantic layers, achieving strong performance on both multimodal and textual dialogue in a plan-grounded setting. Furthermore, we show that the model delivers cross-modal temporal and plan-structure representations aligned between textual plan steps and instructional video moments.


Multimodal Contextualized Plan Prediction for Embodied Task Completion

İnan, Mert, Padmakumar, Aishwarya, Gella, Spandana, Lange, Patrick, Hakkani-Tur, Dilek

arXiv.org Artificial Intelligence

Task planning is an important component of traditional robotics systems enabling robots to compose fine grained skills to perform more complex tasks. Recent work building systems for translating natural language to executable actions for task completion in simulated embodied agents is focused on directly predicting low level action sequences that would be expected to be directly executable by a physical robot. In this work, we instead focus on predicting a higher level plan representation for one such embodied task completion dataset - TEACh, under the assumption that techniques for high-level plan prediction from natural language are expected to be more transferable to physical robot systems. We demonstrate that better plans can be predicted using multimodal context, and that plan prediction and plan execution modules are likely dependent on each other and hence it may not be ideal to fully decouple them. Further, we benchmark execution of oracle plans to quantify the scope for improvement in plan prediction models.


The Complexity of Plan Existence and Evaluation in Probabilistic Domains

Goldsmith, Judy, Littman, Michael L., Mundhenk, Martin

arXiv.org Artificial Intelligence

We examine the computational complexity of testing and finding small plans in probabilistic planning domains with succinct representations. We find that many problems of interest are complete for a variety of complexity classes: NP, co-NP, PP, NP^PP, co-NP^PP, and PSPACE. Of these, the probabilistic classes PP and NP^PP are likely to be of special interest in the field of uncertainty in artificial intelligence and are deserving of additional study. These results suggest a fruitful direction of future algorithmic development.


Exploiting Shared Resource Dependencies in Spectrum Based Plan Diagnosis

Gupta, Shekhar (Palo Alto Research Center) | Roos, Nico (Masstricht University) | Witteveen, Cees (Delft University of Technology) | Price, Bob (Palo Alto Research Center) | DeKleer, Johan (Palo Alto Research Center)

AAAI Conferences

In case of a plan failure, plan-repair is a more promising solution than replanning from scratch. The effectiveness of plan-repair depends on knowledge of which plan action failed and why. Therefore, in this paper, we propose an Extended Spectrum Based Diagnosis approach that efficiently pinpoints failed actions. Unlike Model Based Diagnosis (MBD), it does not require the fault models and behavioral descriptions of actions. Our approach first computes the likelihood of an action being faulty and subsequently proposes optimal probe locations to refine the diagnosis. We also exploit knowledge of plan steps that are instances of the same plan operator to optimize the selection of the most informative diagnostic probes. In this paper, we only focus on diagnostic aspect of plan-repair process.


Making Hybrid Plans More Clear to Human Users - A Formal Approach for Generating Sound Explanations

Seegebarth, Bastian (Ulm University) | Müller, Felix (Ulm University) | Schattenberg, Bernd (Ulm University) | Biundo, Susanne (Ulm University)

AAAI Conferences

Human users who execute an automatically generated plan want to understand the rationale behind it. Knowledge-rich plans are particularly suitable for this purpose, because they provide the means to give reason for causal, temporal, and hierarchical relationships between actions. Based on this information, focused arguments can be generated that constitute explanations on an appropriate level of abstraction. In this paper, we present a formal approach to plan explanation. Information about plans is represented as first-order logic formulae and explanations are constructed as proofs in the resulting axiomatic system. With that, plan explanations are provably correct w.r.t. the planning system that produced the plan. A prototype plan explanation system implements our approach and first experiments give evidence that finding plan explanations is feasible in real-time.


Modeling the Effects of Emotion on Cognition

Spraragen, Marc (University of Southern California)

AAAI Conferences

Understanding the interaction between emotion and cognitive processes is important for developing architectures for general intelligence, and vital for the fields of human social and behavioral modeling, game intelligence, and human-computer interaction. However, relatively little work in AI has been done on emotion in intelligent architectures, particularly on the effect of emotions on cognitive processes such as inference, planning and learning, despite research showing that emotion is a crucial and often beneficial factor in human decision-making. My work will provide a new emotional-cognitive architecture, focusing on a small set of theories, mechanisms and algorithms for the modeling of a wide array of emotional effects on human cognitive processes. The work and its results will be evaluated against current computational models of cognition and emotion, and validated by results from human cognitive science, neuroscience, and psychology.


SAT-Based Parallel Planning Using a Split Representation of Actions

Robinson, Nathan (NICTA and Griffith University) | Gretton, Charles (University of Birmingham) | Pham, Duc Nghia (NICTA) | Sattar, Abdul (NICTA and Griffith University)

AAAI Conferences

Planning based on propositional SAT(isfiability) is a powerful approach to computing step-optimal plans given a parallel execution semantics. In this setting: (i) a solution plan must be minimal in the number of plan steps required, and (ii) non-conflicting actions can be executed instantaneously in parallel at a plan step. Underlying SAT-based approaches is the invocation of a decision procedure on a SAT encoding of a bounded version of the problem. A fundamental limitation of existing approaches is the size of these encodings. This problem stems from the use of a direct representation of actions — i.e. each action has a corresponding variable in the encoding. A longtime goal in planning has been to mitigate this limitation by developing a more compact split — also termed lifted — representation of actions in SAT encodings of parallel step-optimal problems. This paper describes such a representation. In particular, each action and each parallel execution of actions is represented uniquely as a conjunct of variables. Here, each variable is derived from action pre and post- conditions . Because multiple actions share conditions , our encoding of the planning constraints is factored and relatively compact. We find experimentally that our encoding yields a much more efficient and scalable planning procedure over the state-of-the-art in a large set of planning benchmarks.